310 research outputs found

    The benefits of adversarial defense in generalization

    Get PDF
    Recent research has shown that models induced by machine learning, and in particular by deep learning, can be easily fooled by an adversary who carefully crafts imperceptible, at least from the human perspective, or physically plausible modifications of the input data. This discovery gave birth to a new field of research, the adversarial machine learning, where new methods of attacks and defense are developed continuously, mimicking what is happening from a long time in cybersecurity. In this paper we will show that the drawbacks of inducing models from data less prone to be misled can actually provide some benefits when it comes to assessing their generalization abilities. We will show these benefits both from a theoretical perspective, using state-of-the-art statistical learning theory, and both with practical examples

    ReForeSt: Random forests in apache spark

    Get PDF
    Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF are usually preferred with respect to other classification techniques because of their limited hyperparameter sensitivity, high numerical robustness, native capacity of dealing with numerical and categorical features, and effectiveness in many real world classification problems. In this work we present ReForeSt, a Random Forests Apache Spark implementation which is easier to tune, faster, and less memory consuming with respect to MLlib, the de facto standard Apache Spark machine learning library. We perform an extensive comparison between ReForeSt and MLlib by taking advantage of the Google Cloud Platform (https://cloud.google.com). In particular, we test ReForeSt and MLlib with different library settings, on different real world datasets, and with a different number of machines equipped with different number of cores. Results confirm that ReForeSt outperforms MLlib in all the above mentioned aspects. ReForeSt is made publicly available via GitHub (https://github.com/alessandrolulli/reforest)

    Deep Learning for the Generation of Heuristics in Answer Set Programming: A Case Study of Graph Coloring

    Get PDF
    Answer Set Programming (ASP) is a well-established declarative AI formalism for knowledge representation and reasoning. ASP systems were successfully applied to both industrial and academic problems. Nonetheless, their performance can be improved by embedding domain-specific heuristics into their solving process. However, the development of domain-specific heuristics often requires both a deep knowledge of the domain at hand and a good understanding of the fundamental working principles of the ASP solvers. In this paper, we investigate the use of deep learning techniques to automatically generate domain-specific heuristics for ASP solvers targeting the well-known graph coloring problem. Empirical results show that the idea is promising: the performance of the ASP solver wasp can be improved

    Randomized learning and generalization of fair and private classifiers: From PAC-Bayes to stability and differential privacy

    Get PDF
    We address the problem of randomized learning and generalization of fair and private classifiers. From one side we want to ensure that sensitive information does not unfairly influence the outcome of a classifier. From the other side we have to learn from data while preserving the privacy of individual observations. We initially face this issue in the PAC-Bayes framework presenting an approach which trades off and bounds the risk and the fairness of the randomized (Gibbs) classifier. Our new approach is able to handle several different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually knowing it. In particular, we define a prior and a posterior which give more weight to functions with good generalization and fairness properties. Furthermore, we will show that this randomized classifier possesses interesting stability properties using the algorithmic distribution stability theory. Finally, we will show that the new posterior can be exploited to define a randomized accurate and fair algorithm. Differential privacy theory will allow us to derive that the latter algorithm has interesting privacy preserving properties ensuring our threefold goal of good generalization, fairness, and privacy of the final model

    Energy Efficient Smartphone-Based Activity Recognition Using Fixed-Point Arithmetic

    Get PDF
    In this paper we propose a novel energy efficient approach for the recog- nition of human activities using smartphones as wearable sensing devices, targeting assisted living applications such as remote patient activity monitoring for the disabled and the elderly. The method exploits fixed-point arithmetic to propose a modified multiclass Support Vector Machine (SVM) learning algorithm, allowing to better pre- serve the smartphone battery lifetime with respect to the conventional floating-point based formulation while maintaining comparable system accuracy levels. Experiments show comparative results between this approach and the traditional SVM in terms of recognition performance and battery consumption, highlighting the advantages of the proposed method

    Fair regression with wasserstein barycenters

    Get PDF
    We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint. It demands the distribution of the predicted output to be independent of the sensitive attribute. We consider the case that the sensitive attribute is available for prediction. We establish a connection between fair regression and optimal transport theory, based on which we derive a close form expression for the optimal fair predictor. Specifically, we show that the distribution of this optimum is the Wasserstein barycenter of the distributions induced by the standard regression function on the sensitive groups. This result offers an intuitive interpretation of the optimal fair prediction and suggests a simple post-processing algorithm to achieve fairness. We establish risk and distribution-free fairness guarantees for this procedure. Numerical experiments indicate that our method is very effective in learning fair models, with a relative increase in error rate that is inferior to the relative gain in fairness

    An Efficient Hybrid Planning Framework for In-Station Train Dispatching

    Get PDF
    In-station train dispatching is the problem of optimising the effective utilisation of available railway infrastructures for mitigating incidents and delays. This is a fundamental problem for the whole railway network efficiency, and in turn for the transportation of goods and passengers, given that stations are among the most critical points in networks since a high number of interconnections of trains’ routes holds therein. Despite such importance, nowadays in-station train dispatching is mainly managed manually by human operators. In this paper we present a framework for solving in-station train dispatching problems, to support human operators in dealing with such task. We employ automated planning languages and tools for solving the task: PDDL+ for the specification of the problem, and the ENHSP planning engine, enhanced by domain-specific techniques, for solving the problem. We carry out a in-depth analysis using real data of a station of the North West of Italy, that shows the effectiveness of our approach and the contribution that domain-specific techniques may have in efficiently solving the various instances of the problem. Finally, we also present a visualisation tool for graphically inspecting the generated plans

    A communication platform demonstrator for new generation railway traffic management systems: Testing and validation

    Get PDF
    Current rail traffic management and control systems cannot be easily upgraded to the new needs and challenges of modern railway systems because they do not offer interoperable data structures and standardized communication interfaces. To meet this need, the Horizon 2020 Shift2Rail OPTIMA project has developed a communication platform for testing and validating the new generation of traffic management systems (TMS), whose main innovative features are the interoperability of the data structures used, standardization of communications, continuous access to real-time and persistent data from heterogeneous data sources, modularity of components and scalability of the platform. This paper presents the main components, their functions and characteristics, then describes the testing and validation of the platform, even when federated with other innovative TMS modules developed in separate projects. The successful validation of the system has confirmed the achievement of the objectives set and allowed a new set of objectives to be defined for the reference platform for the railway TMS/Traffic Control systems

    So close so different: what makes the difference?

    Get PDF
    The introduction of alien fish species in wetland ecosystems could have a great impact on freshwater communities and ecological processes. Despite fish introduction has been noticed as one of the principal cause of freshwater extinctions, ecosystem processes alteration, and change in aquatic community assemblage, very few data about impact on freshwater reptiles are available. As study model we used two neighbour sub-populations of the endangered Sicilian pond turtle, Emys trinacris, inhabiting two small, close each other and very similar lakes, except for the presence of allocthonous fish, Cyprinus carpio and Gambusia hoolbroki in one of the two. The multi-year study allowed highlighting significant differences in abundance, growth and reproductive output between the two freshwater turtle sub-populations, suggesting their influence on phenotypic plasticity of the studied population. These results are discussed in the light of previous evidence about the impact of these alien species on abundance and assemblage of the invertebrate community with an evident impact on niche width, diet composition and therefore energy intake by Emys trinacris. These data may provide important information to address management strategies and conservation actions of small wetland areas inhabited by pond turtles, pointing out a threats never highlighted up to now
    • …
    corecore